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© Voice activity detection. 



© Voice activity detector (VAD) for use in an LPC coder in a mobile radio system, uses autocorrelation 

coefficients R 0 , Ri of the input signal, weighted and combined, to provide a measure M which depends on 

the power within that part of the spectrum containing no noise, which is thresholded against a variable threshold 
to provide a speech/no speech logic output. The measure is 



2^ R iHi , 



R 0 H 0 + \_ 
i=l 

^ where H t are the autocorrelation coefficients of the impulse response of an Nth order FIR inverse noise filter 
^ derived from LPC analysis of previous non-speech signal frames. Threshold adaption and coefficient update 
are controlled by a second VAD responsive to rate of spectral change between frames. 
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VOICE ACTIVITY DETECTION 



A voice activity detector is a device which is supplied with a signal with the object of detecting periods 
of speech, or periods containing only noise. Although the present invention is not limited thereto, one 
application of particular interest for such detectors is in mobile radio telephone systems where the 
knowledge as to the presence or otherwise of speech can be used exploited by a speech coder to improve 
s the efficient utilisation of radio spectrum, and where also the noise level (from a vehicle-mounted unit) is 
likely to be high. 

The essence of voice activity detection is to locate a measure which differs appreciably between 
speech and non-speech periods. In apparatus which includes a speech coder, a number of parameters are 
readily available from one or other stage of the coder, and it is therefore desirable to economise on 

70 processing needed by utilising some such parameter. In many environments, the main noise sources occur 
in known defined areas of the frequency spectrum. For example, in a moving car much of the noise (eg, 
engine noise) is concentrated in the low frequency regions of the spectrum. Where such knowledge of the 
spectral position of noise is available, it is desirable to base the decision as to whether speech is present or 
absent upon measurements taken from that portion of the spectrum which contains relatively little noise. It 

js would, of course, be possible in practice to pre-fiiter the signal before analysing to detect speech activity, 
but where the voice activity detector follows the output of a speech coder, prefiltering would distort the 
voice signal to be coded. 

According to a first aspect of the invention there is provided voice activity detection apparatus 
comprising means for receiving an input signal, means for estimating the noise signal component of the 
20 input signal, means for continually forming a measure M of the spectral similarity between a portion of the 
input signal and the noise signal, and means for comparing a parameter derived from the measure M with a 
threshold value T to produce an output to indicate the presence or absence of speech in dependence upon 
whether or not that value is exceeded. 

According to a second aspect of the invention there is provided voice activity detection apparatus 
25 comprising: means for continually forming a spectral distortion measure of the similarity between a portion 
of the input signal and earlier portions of the input signal and means for comparing the degree of variation 
between successive values of the measure with a threshold value to produce an output incating the 
presence or absence of speech in dependence upon whether or not that value is exceeded. 
Preferably, the measure is the Itakura-Saito Distortion Measure. 
30 Other aspects of the present invention are as defined in the claims. 

Some embodiments of the invention will now be described, by way of example, with reference to the 
accompanying drawings, in which: 

Figure 1 is a block diagram of a first embodiment of the invention; 
Figure 2 shows a second embodiment of the invention; 
35 Figure 3 shows a third, preferred embodiment of the invention. 

The generaJ principle underlying a first Voice Activity Detector according to the a first embodiment of 
the invention is as follows. 

A frame of n signal samples (s 0 , Si, s 2 , s 3 , s 4 ... s n .^ ) will, when passed through a notional fourth order 
40 finite impulse response (FIR) digital filter of impulse response (1, h 0 . hi, h 2 , h 3 ). result in a filtered signal 
(ignoring samples from previous frames) 
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The zero order autocorrelation coefficient is the sum of each term squared, which may be normalized i.e. 
divided by the total number of terms (for constant frame lengths it is easier to omit the division); that of the 
filtered signal is thus 
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R' 0 0'i )2 

and this is therefore a measure of the power of the notional filtered signal s' - in other words, of that part of 

the signal s which falls within the passband of the notional filter. 

Expanding, neglecting the first 4 terms, 

R'a = (s* + h 0 S3 + hiS2 + h 2 si + hsso) 2 

+ (S5 + hoS4 + hi S3 + h 2 S2 + hsSt) 2 

* ... 

= si + hoS4S3 + hiS4S2 + h 2 s*Si + h3S*So 
+ h c S4S3 + h| Sq + hohiS3S 2 + hoh2SsSi + h 0 h 3 S3So 
+ hiS4S 2 + hohiS 3 s 2 + h?s| + hih 2 S2Si + hih 3 S2So 
+ h 2 s*Si + h 0 hiS 3 Si + hih 2 s 2 si + hf s? + r^hsSiSo 
+ h 3 S4So + h 0 h 3 S3So + h t h 3 s 2 So + h 2 h 3 siSo + h| s| 
+ ... 

= Ro (1 + hi + h? + hi + h| ) 

+ Ri (2h 0 + 2h 0 hi + 2hih 2 + 2h 2 h 3 ) 

+ R 2 f2hi + 2hih 3 + 2h 0 h 2 ) 

+ R 3 (2h 2 + 2h 0 h 3 ) 

+ R 4 (2h 3 ) 

So R o can be obtained from a combination of the autocorrelation coefficients R,, weighted by the bracketed 
constants which determine the frequency band to which the value of R o is responsive. In fact the 
bracketed terms are the autocorrelation coefficients of the impulse response of the notional filter, so that the 
expression above may be simplified to 



M 

R, o = R o H o + 2 XI R i H i' 
i = 1 



where N is the filter order and Hi are the (un-normalised) autocorrelation coefficients of the impulse 
response of the filter. 

In other words, the effect on the signal autocorrelation coefficients of filtering a signal may be simulated 
by producing a weighted sum of the autocorrelation coefficients of the (unfiltered) signal, using the impulse 
response that the required filter would have had. 

Thus, a relatively simple algorithm, involving a small number of multiplication operations, may simulate 
the effect of a digital filter requiring typically a hundred times this number of multiplication operations. 

This filtering operation may alternatively be viewed as a form of spectrum comparison, with the signal 
spectrum being matched against a reference spectrum (the inverse of the response of the notional filter). 
Since the notional filter in this application is selected so as to approximate the inverse of the noise 
spectrum, this operation may be viewed as a spectral comparison between speech and noise spectra, and 
the zeroth autocorrelation coefficient thus generated (i.e. the energy of the inverse filtered signal) as a 
measure of dissimilarity between the spectra. The Itakura-Saito distortion measure is used in LPC to assess 
the match between the predictor filter and the input spectrum, and in one form is expressed as 



50 



M = 



R 0 A 0 



R.A., 



55 where A 3 etc are the autocorrelation coefficients of the LPC parameter set. It will be seen that this is closely 
similar to the relationship derived above, and when it is remembered that the LPC coefficients are the taps 
of an FIR filter having the inverse spectral response of the input signal so that the LPC coefficient set is the 
impulse response of the inverse LPC filter, it will be apparent that the Itakura-Saito Distortion Measure is in 
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fact merely a form of equation 1 , wherein the filter response H is the inverse of the spectral shape of an all- 
pole mode) of the input signal. 

In fact, it is also possible to transpose the spectra, using the LPC coefficients of the test spectrum and 
the autocorrelation coefficients of the reference spectrum, to obtain a different measure of spectral 
5 similarity. 

The l-S Distortion measure is further discussed in "Speech Coding based upon Vector Quantisation" by 
A Buzo. A H Gray, R M Gray and J D Markel. IEEE Trans on ASSP, Vol ASSP-28. No 5, October 1980. 

Since the frames of signal have only a finite length, and a number of terms (N, where N is the filter 
order) are neglected, the above result is an approximation only; it gives, however, a surprisingly good 

io indicator of the presence or absence of speech and thus may be used as a measure M in speech detection. 
In an environment where the noise spectrum is well known and stationary, it is quite possible to simply 
employ fixed h 0 . ht etc coefficients to model the inverse noise filter. 

However, apparatus which can adapt to different noise environments is much more widely useful. 
Referring to Figure 1, in a first embodiment a signal from a microphone (not shown) is received at an 

rs input 1 and converted to digital samples s at a suitable sampling rate by an analogue to digital converter 2. 
An LPC analysis unit 3 (in a known type of LPC coder) then derives, for successive frames of n (eg 160) 
samples, a set of N (eg 8 or 12) LPC filter coefficients L, which are transmitted to represent the input 
speech. The speech signal s also enters a correlator unit 4 (normally part of the LPC coder 3 since the 
autocorrelation vector Rj of the speech is also usually produced as a step in the LPC analysis although it 

20 will be appreciated that a separate correlator could be provided). The correlator 4 produces the autocor- 
relation vector R f , including the zero order correlation coefficient Ro and at least 2 further autocorrelation 
coefficients Ri, R2. R3. These are then supplied to a multiplier unit 5. 

A second input 11 is connected to a second microphone located distant from the speaker so as to 
receive only background noise. The input from this microphone is converted to a digital input sample train 

25 by AD converter 12 and LPC analysed by a second LPC analyser 13. The "noise" LPC coefficients 
produced from analyser 13 are passed to correlator unit 14, and the autocorrelation vector thus produced is 
multiplied term by term with the autocorrelation coefficients Ri of the ingut signal from the speech 
microphone in multiplier 5 and the weighted coefficients thus produced are combined in adder 6 according 
to Equation 1, so as to-apply a filter having the inverse shape of the noise spectrum from the noise-only 

3Q microphone (which in practice is the same as the shape of the noise spectrum in the signal-plus-noise 
microphone) and thus filter out most of the noise. The resulting measure M is thresholded by thresholder 7 
to produce a logic output 8 indicating the presence or absence of speech; 'rf M is high, speech is deemed 
to be present. 

This embodiment does, however, require two microphones and two LPC analysers, which adds to the 

35 expense and complexity of the equipment necessary. 

Alternatively, another embodiment uses a corresponding measure formed using the autocorrelations 
from the noise microphone 11 and the LPC coefficients from the main microphone 1, so that an extra 
autocorrelator rather than an LPC analyser is necessary. 

These embodiments are therefore able to operate within different environments having noise at different 

40 frequencies, or within a changing noise spectrum in a given environment 

Referring to Figure 2, in the preferred embodiment of the invention, there is provided a buffer 1 5 which 
stores a set of LPC coefficients (or the autocorrelation vector of the set) derived from the microphone input 
1 in a period identified as being a "non speech" (ie noise only) period. These coefficients are then used to 
derive a measure using equation 1, which also of course corresponds to the Itakura-Saito Distortion 

45 Measure, except that a single stored frame of LPC coefficients corresponding to an approximation of the 
inverse noise spectrum is used, rather than the present frame of LPC coefficients. 

The LPC coefficient vector Li output by analyser 3 is also routed to a correlator 14, which produces the 
autocorrelation vector of the LPC coefficient vector. The buffer memory 1 5 is controlled by the speech/non- 
speech output of thresholder 7, in such a way that during "speech" frames the buffer retains the "noise" 

so autocorrelation coefficients, but during "noise" frames a new set of LPC coefficients may be mused to update 
the buffer, for example by a multiple switch 16, via which outputs of the correlator 14. carrying each 
autocorrelation coefficient, are connected to the buffer 15. It will be appreciated that correlator 14 could be 
positioned after buffer 1 5. Further, the speech/no-speech decision for coefficient update need not be from 
output 8, but could be (and preferably is) otherwise derived. 

55 Since frequent periods without speech occur, the LPC coefficients stored in the buffer are updated from 
time to time, so that the apparatus is thus capable of tracking changes in the noise spectrum. It will be 
appreciated that such updating of the buffer may be necessary only occasionally, or may occur only once 
at the start of operation of the detector, if (as is often the case) the noise spectrum is relatively stationary 
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over time, but in a mobile radio environment frequent updating is preferred. 

In a modification of this embodiment, the system initially employs equation 1 with coefficient terms 
corresponding to** a simple fixed high pass filter, and then subsequently starts to adapt by switching over to 
using "noise period" LPC coefficients. If, for some reason, speech detection fails, the system may return to 
using the simple high pass filter. 

It is possible to normalise the above measure by dividing through by Ro, so that the expression to be 
thresholded has the form 



This measure is independent of the total signal energy in a frame and is thus compensated for gross signal 
level changes, but gives rather less marked contrast between "noise" and "speech" levels and is hence 
preferably not employed in high-noise environments. 

Instead of employing LPC analysis to derive the inverse filter coefficients of the noise signal (from either 
the noise microphone or noise only periods, as in the various embodiments described above), it is possible 
to model the inverse noise spectrum using an adaptive filter of known type; as the noise spectrum changes 
only slowly (as discussed below) a relatively slow coefficient adaption rate common for such filters is 
acceptable. In one embodiment, which corresponds to Figure 1. LPC analysis unit 13 is simply replaced by 
an adaptive filter (for example a transversal FIR or lattice filter), connected so as to whiten the noise input 
by modelling the inverse filter, and its coefficients are supplied as before to autocorrelator 14. 

In a second embodiment, corresponding to that of Figure 2, LPC analysis means 3 is replaced by such 
an adapter filter, and buffer means '15 is omitted, but switch 16 operates to prevent the adaptive filter from 
adapting its coefficients during speech periods. 

A second Voice Activity Detector in accordance with another aspect of the invention will now be 
described. 

From the foregoing, it will be apparent that the LPC coefficient vector is simply the impulse response of 
an FIR filter which has a response approximating the inverse spectral shape of the input signal. When the 
itakura-Saito Distortion Measure between adjacent frames is formed, this is in fact equal to the power of the 
signal, as filtered by the LPC filter of the previous frame. So if spectra of adjacent frames differ little, a 
correspondingly small amount of the spectral power of a frame will escape filtering and the measure will be 
low. Correspondingly, a large interframe spectral difference produces a high Itakura-Saito Distortion 
Measure, so that the measure reflects the spectral similarity of adjacent frames. In a speech coder, it is 
desirable to minimise the data rate, so frame length is made as long as possible; in other words, if the 
frame length is long enough, then a speech signal should show a significant spectral change from frame to 
frame (if it does not. the coding is redundant). Noise, on the other hand, has a slowly varying spectral shape 
from frame to frame, and so in a period where speech is absent from the signal then the Itakura-Saito 
Distortion Measure will correspondingly be low - since applying the inverse LPC filter from the previous 
frame "filters out" most of the noise power. 

Typically, the Itakura-Saito Distortion Measure between adjacent frames of a noisy signal containing 
intermittent speech is higher during periods of speech than periods of noise; the degree of variation (as 
45 illustrated by the standard deviation) is higher, and less intermittently variable. 

it is noted that the standard deviation of the standard deviation of M is also a reliable measure; the 
effect of taking each standard deviation is essentially to smooth the measure. 

In this second form of Voice Activity Detector, the measured parameter used to decide whether speech 
is present is preferably the standard deviation of the Itakura-Saito Distortion Measure, but other measures of 
50 variance and other spectral distortion measures (based for example on FFT analysis) could be employed. 

It is found advantageous to employ an adaptive threshold in voice activity detection. Such thresholds 
must not be adjusted during speech periods or the speech signal will be thresholded out. It is accordingly 
necessary to control the threshold adapter using a speech/non-speech control signal, and it is preferable 
that this control signal should be independent of the output of the threshold adapter. 
55 The threshold T is adaptively adjusted so as to keep the threshold level just above the level of the measure 
M when noise only is present Since the measure will in general vary randomly when noise is present, the 
threshold is varied by determining an average level over a number of blocks, and setting the threshold at a 
level proportional to this average. In a noisy environment this is not usually sufficient, however, and so an 
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assessment of the degree of variation of the parameter over several blocks is also taken into account. 

The threshold value T is therefore preferably calculated according to 
T = m' + K.d 

where m' is the average value of the measure over a number of consecutive frames, d is the standard 
deviation of the measure over those frames, and K is a constant (which may typically be 2). 

In practice, it is preferred not to resume adaptation immediately after speech is indicated to be absent, 
but to wait to ensure the fall is stable (to avoid rapid repeated switching between the adapting and non- 
adapting states). 

Referring to Figure 3. in a preferred embodiment of the invention incorporating the above aspects, an 
input 1 receives a signal which is sampled and digitised by analogue to digital converter (ADC) 2, and 
supplied to the input of an inverse filter analyser 3 t which in practice is part of a speech coder with which 
the voice activity detector is to work, and which generates coefficients U (typically 8) of a filter correspond- 
ing to the inverse of the input signal spectrum. The digitised signal is also supplied to an autocorrelator 4. 
(which is part of analyser 3) which generates the autocorrelation vector Rj of the input signal (or at least as 
many low order terms as there are LPC coefficients). Operation of these parts of the apparatus is as 
described in Figres 1 and 2. Preferably, the autocorrelation coefficients R ; are then averaged over several 
successive speech frames (typically 5-20 ms long) to improve their reliability. This may be achieved by 
storing each set of autocorrelations coefficients output by autocorrelator 4 in a buffer 4a, and employing an 
averager 4b to produce a weighted sum of the current autocorrelation coefficients Ri and those from 
previous frames stored in and supplied from buffer 4a. The averaged autocorrelation coefficients Raj thus 
derived are supplied to weighting and adding means 5,6 which receives also the autocorrelation vector A, of 
stored noise-period inverse filter coefficients U from an autocorrelator 14 via buffer 15. and forms from R^ 
and A| a measure M preferably defined as: 

M 

K = B 0 +2^>~Ra i B i , 
t- • R o 

This measure is then thresholded by thresholder 7 against a threshold level, and the logical result 
provides an indication of the presence or absence of speech at output 8. 

In order that the inverse filter coefficients L,- correspond to a fair estimate of the inverse of the noise 
spectrum, it is desirable to update these coefficients during periods of noise (and, of course, not to update 
during periods of speech). It is, however, preferable that the speech/non-speech decision on which the 
updating is based does not depend upon the result of the updating, or else a single wrongly identified frame 
of signal may result in the voice activity detector subsequently going "out of lock" and wrongly identifying 
following frames. Preferably, therefore, there is provided a control signal generating circuit 20. effectively a 
separate voice activity detector, which forms an independent control signal indicating the presence or 
absence of speech to control inverse filter analyser 3 (or buffer 8) so that the inverse filter autocorrelation 
coefficients A, used to form the measure M are only updated during "noise" periods. The control signal 
generator circuit 20 includes LPC analyser 21 (which again may be part of a speech coder and, specifically, 
may be performed by analyser 3), which produces a set of LPC coefficients Mj corresponding to the input 
signal and an autocorrelator 21 a (which may be performed by autocorrelator 3a) which derives the 
autocorrelation coefficients B, of Mj. If analyser 3 is performed by analyser 3. then M,^ and Bj^Aj. These 
autocorrelation coefficients are then supplied to weighting and adding means 22,23 (equivalent to 5, 6) 
which receive also the autocorrelation vector R, of the input signal from autocorrelator 4. A measure of the 
spectral similarity between the input speech frame and the preceding speech frame is thus calculated; this 
may be the Itakura-Saito distortion measure between R| of the present frame and B, of the preceding frame, 
as disclosed above, or it may instead be derived by calculating the Itakura - Saito distortion measure for R| 
and B, of the present frame, and subtracting (in sub-tractor 25) the corresponding measure for the previous 
frame stored in buffer 24, to generate a spectral difference signal (in either case, the measure is preferably 
energy-normalised by dividing by Ro). The buffer 24 is then, of course, updated. This spectral difference 
signal, when thresholded by a thresholder 26 is, as discussed above, an indicator of the presence or 
absence of speech. We have found, however, that although this measure is excellent for distinguishing 
noise from unvoiced speech (a task which prior art systems are generally incapable of) it is in general 
rather less able to distinguish noise from voiced speech. Accordingly, there is preferably further provided 
within circuit 20 a voiced speech detection circuit comprising a pitch analyser 27 (which in practice may 
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operate as part of a speech coder, and in particular may measure the long term predictor lag value 
produced in a multipulse LPC coder). The pitch analyser 27 produces a logic signal which is "true" when 
voiced speech is detected, and this signal, together with the thresholded measure derived from thresholder 
26 (which will generally be "true" when unvoiced speech is present) are supplied to the inputs of a NOR 

5 gate 28 to generate a signal which is "false" when speech is present and "true" when noise is present. This 
signaJ is supplied to buffer 8 (or to inverse filter analyser 3) so that inverse filter coefficients U are only 
updated during noise periods. 

Threshold adapter 29 is also connected to receive the non-speech signaJ control output of control signal 
generator circuit 20. The output of the threshold adapter 29 is supplied to thresholder 7. The threshold 

io adapter operates to increment or decrement the threshold in steps which are a proportion of the instant 
threshold value, until the threshold approximates the noise power level (which may conveniently be derived 
from, for example, weighting and adding circuits 22, 23). When the input signal is very low, it may be 
desirable that the threshold is automatically set to a fixed, low, level since at the low signal levels the effect 
of signaJ quantisation produced by ADC 2 can produce unreliable results. 

?5 There may be further provided "hangover" generating means 30, which operates to measure the 
duration of indications of speech after thresholder 7 and. when the presence of speech has been indicated 
for a period in excess of a predetermined time constant, the output is held high for a short "hangover" 
period. In this way, clipping of the middle of low-level speech bursts is avoided, and appropriate selection of 
the time constant prevents triggering of the hangover generator 30 by short spikes of noise which are 

20 falsely indicated as speech. It will of course be appreciated that all the above functions may be executed by 
a single suitably programmed digital processing means such as a Digital Signal Processing (DSP) chip, as 
part of an LPC codec thus implemented (this is the preferred implementation), or as a suitably programmed 
microcomputer or microcontroller chip with an associated memory device. 

Conveniently, as described above, the voice detection apparatus may be implemented as part of an 

25 LPC codec. Alternatively, where autocorrelation coefficients of the signal or related measures (partial 
correlation, or "parcor", coefficients) are transmitted to a distant station the voice detection may take place 
distantly from the codec. 



1. Voice activity detection apparatus comprising means for receiving an Input signal, means for 
estimating the noise signal component of the input signal, means for continually forming a measure M of 
the spectral similarity between a portion of the input signal and the noise signal component, and means for 

35 comparing a parameter derived from the measure M with a threshold value T to produce an output to 
indicate the presence or absence of speech in dependence upon whether or not that value is exceeded. 

2. Apparatus according to claim 1, in which the noise estimating means comprises means for 
computing the autocorrelation coefficients A| of the impulse response of an FIR filter having a response 
approximating the inverse of the short term spectrum of the noise signal component, and the measure 

40 forming means comprises means for computing the autocorrelation coefficients R| of the signal, means 
connected to receive Rj and A { , and to calculate M therefrom, the parameter being the value of M. 

3. Apparatus according to claim 2. in which 



5. Apparatus according to any one of claims 2 to 4, further comprising an input arranged to receive a 
55 second signaJ. similarly subject to noise, from which speech is absent, in which the Aj computing means 
comprises LPC analysis means for deriving values of A, from the second signal. 



30 Claims 



45 




4. Apparatus according to claim 2, in which 
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6. Apparatus according to any one of claims 2 to 4, further comprising a buffer connected to store data 
from which the autocorrelation coefficients A| of the said filter response may be derived, in which the said 
filter response is periodically calculated from the signal by LPC analysis means, the apparatus being so 
connected and controlled that the measure M is calculated using the said stored data, and the said stored 
data is updated only from periods in which speech is indicated to be absent. 

7. Apparatus according to any one of claims 1 to 4 in which the noise estimating means includes an 
adaptive filter. 

8. Apparatus according to any one of claims 2 to 6 characterised in that the means for computing the 
autocorrelation coefficients of the signal are arranged to do so in dependence upon the autocorrelation 
coefficients of several successive portions of the signal. 

9. Apparatus according to claim 1 in which the measure M is a spectral distortion measure. 

10. Apparatus according to claim 9 in which the measure M is the Itakura-Saito Distortion measure. 

1 1 . Apparatus according to any one of the preceding claims, further comprising means for adjusting the 
said predetermined threshold T during periods when speech is indicated to be absent 

12. Apparatus detector according to claim 11, further comprising second voice activity detection means 
arranged to prevent adjustment of the threshold value when speech is present. 

13. Apparatus detector as claimed in claim 11 or claim 12. in which the threshold value T is. when 
adjusted, adjusted to be equal to the mean of the measure plus a term which is a function of the standard 
deviation of the measure. 

14. Voice activity detection apparatus comprising: means for continually forming a spectral distortion 
measure of the similarity between a portion of the input signal and earlier portions of the input signal and 
means for comparing the degree of variation between successive values of the measure with a threshold 
value to produce an output indicating the presence or absence of speech in dependence upon whether or 
not that value is exceeded. 

15. Apparatus as claimed in claim 14, wherein the degree of variation is measured as the standard 
deviation of a block of successive values of the measure. 

16. Apparatus according to Claim 6 further comprising means for indicating the absence of speech to 
control the updating of the said stored data, the means for indicating the absence of speech being a second 
voice activity detection means. 

17. Apparatus according to Claim 16 and Claim 13 in which the said second voice activity detection 
means controls both threshold adaption and data updating. 

18. Apparatus according to Claim 13 or Claim 16 or Claim 17 in which said second voice activity 
detection means is apparatus according to Claim 14 or Claim 15. 

19. A method of detecting speech activity in a signal, comprising the steps of comparing the signal 
spectrum with an estimated noise spectrum, forming a variable measure of the spectral similarity there- 
between, and comparing that measure with a threshold. 

20. A method of detecting speech activity in a signal, comprising the steps of comparing the signal 
spectrum with a preceding portion of the signal, forming a variable measure of the spectral similarity 
therebetween, and comparing the degree of variation between successive values of the measure with a 
threshold. 

21. Voice activity detection apparatus substantially as herein described, with reference to Figure 1 or 
Figure 2 or Figure 3. 

22. Apparatus for encoding speech signals including apparatus according to any preceding claim. 

23. Mobile telephone apparatus including apparatus according to any preceding claim. 

24. A method of detecting speech substantially as herein described. 
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Description 

A voice activity detector is a device which is supplied with a signal with the object of detecting periods 
of speech, or periods containing only noise. Although the present invention is not limited thereto, one 
5 application of particular interest for such detectors is in mobile radio telephone systems where the 
knowledge as to the presence or otherwise of speech can be used exploited by a speech coder to improve 
the efficient utilisation of radio spectrum, and where also the noise level (from a vehicle-mounted unit) is 
likely to be high. 

The essence of voice activity detection is to locate a measure which differs appreciably between 
70 speech and non-speech periods. In apparatus which includes a speech coder, a number of parameters are 
readily available from one or other stage of the coder, and it is therefore desirable to economise on 
processing needed by utilising some such parameter. In many environments, the main noise sources occur 
in known defined areas of the frequency spectrum. For example, in a moving car much of the noise (eg, 
engine noise) is concentrated in the low frequency regions of the spectrum. Where such knowledge of the 
75 spectral position of noise is available, it is desirable to base the decision as to whether speech is present or 
absent upon measurements taken from that portion of the spectrum which contains relatively little noise. It 
would, of course, be possible in practice to pre-filter the signal before analysing to detect speech activity, 
but where the voice activity detector follows the output of a speech coder, prefiltering would distort the 
voice signal to be coded. 

20 In US4358738, a voice activity detector is disclosed which compares the input signal with predeter- 
mined noise characteristics, by filtering the input signal through a pair of manually balanced bandpass filters 
(employing analogue components) to form two frequency dependent energy segments. This method is of 
limited usefulness for many reasons; firstly, such a crude arrangement ignores the fact that many types of 
noise could have an energy balance between the two bands similar to a speech signal, secondly, balancing 

25 the filters is laborious and requires a manual detection of noise periods for balancing, and thirdly, such a 
device is unable to adjust to changing noise or spectral changes in the environment (or communications 
channel). 

In IEEE transactions on acoustics, speech and signal processing, vol ASSP-25. No. 4, August 1977, 
page 338-343, Rabiner et al "Application of an LPC distance measure to the voiced unvoiced silence 

30 detection problem", there is disclosed a classifier for discriminating between silence, unvoiced speech, and 
voiced speech which has been transmitted over a telephone line. The method comprises initially using 
manually classified "silence", "voiced", and "unvoiced" frames of speech signals to drive reference 
patterns, and then comparing the input signal to each of these using a comparison measure and selecting 
the reference pattern to which the input signal is closest. This method shares some of the disadvantages of 

35 US4358738, in that it requires extensive manual intervention in selecting "silence" frames from training data 
and forming therefrom the reference pattern, and that since the reference pattern is fixed changes in the 
environment result in wrong identifications. These problems are greatly exacerbated in high level noise 
environments (such as a moving vehicle) compared to the low level noise environment (silence over a 
telephone line) described by Rabiner. 

40 European patent application published as EP-A-0127718 and US patent 4672669 describe a voice 
activity detection apparatus in which a first test is made on signal amplitude and a second test is based on 
analysis of changes in the short-term signal spectrum. Specifically, the spectral analysis is performed by 
comparing the autocorrelation of the signal with that of an earlier portion of the signal deemed to be 
speech-free. 

45 According to one aspect of the present invention there is provided a voice activity detection apparatus 
comprising: 

(i) means for receiving a first, input, signal; 

(ii) means for periodically adaptively generating a second signal representing an estimated noise signal 
component of the first signal; 

so (iii) means for periodically forming from the first and second signals a measure of the spectral similarity 
between a portion of the input signal and the said estimated noise signal component; and 
(iv) means for comparing the measure with a threshold value to produce an output indicating the 
presence or absence of speech; 
in which 

55 (v) the generating means includes analysis means operable to produce the coefficients of a filter having 
a spectral response which is the inverse of the frequency spectrum of one of the said two signals; and 
(vi) the measure forming means are operable to produce a measure which is proportional to the zero- 
order autocorrelation of the other of the said two signals after filtering by a filter having the said 
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coefficients. 

In another aspect, the invention provides a method of detecting voice activity in a first, input, signal, 
comprising 

(a) periodically adaptively generating a second signal representing an estimated noise signal component 
5 of the first signal; 

(b) periodically forming from the first and second signals a measure of the spectral similarity between a 
portion of the input signal and the said estimated noise signal component; and 

(c) comparing the measure with a threshold value to produce an output indicating the presence or 
absence of speech; 

w in which 

(d) the generating step includes producing the coefficients of a filter having a spectral response which is 
the inverse of the frequency spectrum of signals; and 

(e) the measure is proportional to the zero-order autocorrelation of the other of the said two signals after 
filtering by a filter having the said coefficients. 

75 Other aspects of the present invention are as defined in the claims. 

Some embodiments of the invention will now be described, by way of example, with reference to the 
accompanying drawings, in which: 

Figure 1 is a block diagram of a first embodiment of the invention; 
Figure 2 shows a second embodiment of the invention; 
20 Figure 3 shows a third, preferred embodiment of the invention. 

The genera! principle underlying a first Voice Activity Detector according to the a first embodiment of 
the invention is as follows. 

A frame of n signal samples 
(so, Si, S2. S3, s« ... Sn-T ) will, when passed through a notional fourth order finite impulse response (FIR) 
25 digital filter of impulse response (1, ho, hi, h2, ri3), result in a filtered signal (ignoring samples from previous 
frames) 
s' = 
(so), 

(si + h 0 So), 
30 (S2 + h 0 Si + hi so), 

(S3 + h 0 S2 + hiSi + h 2 So), 

(S4 + hoS3 + hiS2 + h 2 Si + hiSo), 

(S5 + hoS4 + hi S3 + fl2S2 + h3Si), 
(S6 + hoSs + hiS4 + h2S3 + h3S2), 
35 (S 7 ... ) 

The zero order autocorrelation coefficient is the sum of each term squared, which may be normalized i.e. 
divided by the total number of terms (for constant frame lengths it is easier to omit the division); that of the 
filtered signal is thus 



40 



45 



2 



•.-L< 



and this is therefore a measure of the power of the notional filtered signal s' - in other words, of that part of 
the signal s which falls within the passband of the notional filter. 
Expanding, neglecting the first 4 terms, 
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R, 0 = (S 4 + h 0 S 3 + h l S 2 + h 2 S l + Vo } ! 
+ (s 5 ♦ h Q s 4 + h lS3 + h 2 s 2 ♦ h^)' 

+ ... 



10 



- S, 



+ h 0 S 4 S 3 



+ h lS4 s 2 



+ h 3 s 4 s 0 



75 



♦ h 0 s 4 s 3 



+ h l S 4 S 2 



+ h 0 S 0 



+ Vl S 3 S 2 



♦ h 0 h lS3 s 2 



+ h ? S 2 



+ h 0 h 2 S 3 S l 



+ h 1 h 2 s 2 s 1 



♦ h Q h 3 s 3 s 0 



+ h 1 h 3 s 2 s (J 



20 



♦ h 2 s 4Sl 



♦ Vl S 3 S l 



* h 2 c 2 

+ h 2 s 1 



+ h 2 h 3 S l S 0 



+ h 3 s 4 s Q 



+ h 0 h 3 S 3 S 0 



♦ h 1 h 3 s 2 s 0 



+ h 2 h 3 s 1 s Q 



4. h 2 C 2 

+ h 3 s 0 



25 



30 



35 



= R 0 (1 + h 0 + h ? + h 2 + h 3 } 

+ (2h Q + 2h Q h 1 + 2h x h 2 + 2h 2 h 3 ) 

+ R 2 (2h x + 2h 1 h 3 + 2h Q h 2 ) 

+ R 3 (2h 2 . + 2h Q h 3 ) 

+ R 4 (2h 3 ) 



So R'o can be obtained from a combination of the autocorrelation coefficients Rj, weighted by the 
bracketed constants which determine the frequency band to which the value of R*o is responsive. In fact, 
40 the bracketed terms are the autocorrelation coefficients of the impulse response of the notional filter, so that 
the expression above may be simplified to 



H 

Rr o = R o H o + 2 XI R i H i' 

i = 1 



so where N is the filter order and Hj are the (un-normalised) autocorrelation coefficients of the impulse 
response of the filter. 

In other words, the effect on the signal autocorrelation coefficients of filtering a signal may be simulated 
by producing a weighted sum of the autocorrelation coefficients of the (unfiltered) signal, using the impulse 
response that the required filter would have had. 
55 Thus, a relatively simple algorithm, involving a small number of multiplication operations, may simulate 
the effect of a digital fitter requiring typically a hundred times this number of multiplication operations. 

This filtering operation may alternatively be viewed as a form of spectrum comparison, with the signal 
spectrum being matched against a reference spectrum (the inverse of the response of the notional filter). 
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Since the notional filter in this application is selected so as to approximate the inverse of the noise 
spectrum, this operation may be viewed as a spectral comparison between speech and noise spectra, and 
the zeroth autocorrelation coefficient thus generated (i.e. the energy of the inverse filtered signal) as a 
measure of dissimilarity between the spectra. The Itakura-Saito distortion measure is used in LPC to assess 
5 the match between the predictor filter and the input spectrum, and in one form is expressed as 



where Ao etc are the autocorrelation coefficients of the LPC parameter set. It will be seen that this is closely 
similar to the relationship derived above, and when it is remembered that the LPC coefficients are the taps 

15 of an FIR filter having the inverse spectral response of the input signal so that the LPC coefficient set is the 
impulse response of the inverse LPC filter, it will be apparent that the Itakura-Saito Distortion Measure is in 
fact merely a form of equation 1 , wherein the filter response H is the inverse of the spectral shape of an all- 
pole model of the input signal. 

In fact, it is also possible to transpose the spectra, using the LPC coefficients of the test spectrum and 

20 the autocorrelation coefficients of the reference spectrum, to obtain a different measure of spectral 
similarity. 

The l-S Distortion measure is further discussed in "Speech Coding based upon Vector Quantisation" by 
A Buzo, A H Gray, R M Gray and J D Markel, IEEE Trans on ASSP, Vol ASSP-28, No 5, October 1980. 

Since the frames of signal have only a finite length, and a number of terms (N, where N is the filter 

25 order) are neglected, the above result is an approximation only; it gives, however, a surprisingly good 
indicator of the presence or absence of speech and thus may be used as a measure M in speech detection. 
In an environment where the noise spectrum is well known and stationary, it is quite possible to simply 
employ fixed h 0 , hi etc coefficients to model the inverse noise filter. 

However, apparatus which can adapt to different noise environments is much more widely useful. 

30 Referring to Figure 1 , in a first embodiment, a signal from a microphone (not shown) is received at an 
input 1 and converted to digital samples s at a suitable sampling rate by an analogue to digital converter 2. 
An LPC analysis unit 3 (in a known type of LPC coder) then derives, for successive frames of n (eg 160) 
samples, a set of N (eg 8 or 12) LPC filter coefficients L } which are transmitted to represent the input 
speech. The speech signal s also enters a correlator unit 4 (normally part of the LPC coder 3 since the 

35 autocorrelation vector Rj of the speech is also usually produced as a step in the LPC analysis although it 
will be appreciated that a separate correlator could be provided). The correlator 4 produces the autocor- 
relation vector Rj, including the zero order correlation coefficient Ro and at least 2 further autocorrelation 
coefficients Ri , R2, R3. These are then supplied to a multiplier unit 5. 

A second input 11 is connected to a second microphone located distant from the speaker so as to 

40 receive only background noise. The input from this microphone is converted to a digital input sample train 
by AD convertor 12 and LPC analysed by a second LPC analyser 13. The "noise" LPC coefficients 
produced from analyser 13 are passed to correlator unit 14, and the autocorrelation vector thus produced is 
multiplied term by term with the autocorrelation coefficients Rj of the input signal from the speech 
microphone in multiplier 5 and the weighted coefficients thus produced are combined in adder 6 according 

45 to Equation 1, so as to apply a filter having the inverse shape of the noise spectrum from the noise-only 
microphone (which in practice is the same as the shape of the noise spectrum in the signal-plus-noise 
microphone) and thus filter out most of the noise. The resulting measure M is thresholded by thresholder 7 
to produce a logic output 8 indicating the presence or absence of speech; if M is high, speech is deemed to 
be present. 

50 This embodiment does, however, require two microphones and two LPC analysers, which adds to the 
expense and complexity of the equipment necessary. 

Alternatively, another embodiment uses a corresponding measure formed using the autocorrelations 
from the noise microphone 11 and the LPC coefficients from the main microphone 1, so that an extra 
autocorrelator rather than an LPC analyser is necessary. 
55 These embodiments are therefore able to operate within different environments having noise at different 
frequencies, or within a changing noise spectrum in a given environment. 

Referring to Figure 2, in the preferred embodiment of the invention, there is provided a buffer 15 which 
stores a set of LPC coefficients (or the autocorrelation vector of the set) derived from the microphone input 
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1 in a period identified as being a "non speech" (ie noise only) period. These coefficients are then used to 
derive a measure using equation 1, which also of course corresponds to the Itakura-Saito Distortion 
Measure, except that a single stored frame of LPC coefficients corresponding to an approximation of the 
inverse noise spectrum is used, rather than the present frame of LPC coefficients. 

5 The LPC coefficient vector U output by analyser 3 is also routed to a correlator 14, which produces the 
autocorrelation vector of the LPC coefficient vector. The buffer memory 1 5 is controlled by the speech/non- 
speech output of thresholder 7, in such a way that during "speech" frames the buffer retains the "noise" 
autocorrelation coefficients, but during "noise" frames a new set of LPC coefficients may be used to update 
the buffer, for example by a multiple switch 16, via which outputs of the correlator 14, carrying each 

io autocorrelation coefficient, are connected to the buffer 15. It will be appreciated that correlator 14 could be 
positioned after buffer 15. Further, the speech/no-speech decision for coefficient update need not be from 
output 8, but could be (and preferably is) otherwise derived. 

Since frequent periods without speech occur, the LPC coefficients stored in the buffer are updated from 
time to time, so that the apparatus is thus capable of tracking changes in the noise spectrum. It will be 

75 appreciated that such updating of the buffer may be necessary only occasionally, or may occur only once 
at the start of operation of the detector, if (as is often the case) the noise spectrum is relatively stationary 
over time, but in a mobile radio environment frequent updating is preferred. 

In a modification of this embodiment, the system initially employs equation 1 with coefficient terms 
corresponding to a simple fixed high pass filter, and then subsequently starts to adapt by switching over to 

20 using "noise period" LPC coefficients. If, for some reason, speech detection fails, the system may return to 
using the simple high pass filter. 

It is possible to normalise the above measure by dividing through by Ro, so that the expression to be 
thresholded has the form 

25 

N 

30 

This measure is independent of the total signal energy in a frame and is thus compensated for gross signal 
level changes, but gives rather less marked contrast between "noise" and "speech" levels and is hence 
preferably not employed in high-noise environments. 

35 Instead of employing LPC analysis to derive the inverse filter coefficients of the noise signal (from either 
the noise microphone or noise only periods, as in the various embodiments described above), it is possible 
to model the inverse noise spectrum using an adaptive filter of known type; as the noise spectrum changes 
only slowly (as discussed below) a relatively slow coefficient adaption rate common for such filters is 
acceptable. In one embodiment, which corresponds to Figure 1, LPC analysis unit 13 is simply replaced by 

40 an adaptive filter (for example a transversal FIR or lattice filter), connected so as to whiten the noise input 
by modelling the inverse filter, and its coefficients are supplied as before to autocorrelator 14. 

In a second embodiment, corresponding to that of Figure 2, LPC analysis means 3 is replaced by such 
an adaptive filter, and buffer means 15 is omitted, but switch 16 operates to prevent the adaptive filter from 
adapting its coefficients during speech periods. 

45 A second Voice Activity Detector for use with another embodiment of the invention will now be 
described. 

From the foregoing, it will be apparent that the LPC coefficient vector is simply the impulse response of 
an FIR filter which has a response approximating the inverse spectral shape of the input signal. When the 
Itakura-Saito Distortion Measure between adjacent frames is formed, this is in fact equal to the power of the 

so signal, as filtered by the LPC filter of the previous frame. So if spectra of adjacent frames differ little, a 
correspondingly small amount of the spectral power of a frame will escape filtering and the measure will be 
low. Correspondingly, a large interframe spectral difference produces a high Itakura-Saito Distortion 
Measure, so that the measure reflects the spectral similarity of adjacent frames. In a speech coder, it is 
desirable to minimise the data rate, so frame length is made as long as possible; in other words, if the 

55 frame length is long enough, then a speech signal should show a significant spectral change from frame to 
frame (if it does not, the coding is redundant). Noise, on the other hand, has a slowly varying spectral shape 
from frame to frame, and so in a period where speech is absent from the signal then the Itakura-Saito 
Distortion Measure will correspondingly be low - since applying the inverse LPC filter from the previous 
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frame "filters out" most of the noise power. 

Typically, the Itakura-Saito Distortion Measure between adjacent frames of a noisy signal containing 
intermittent speech is higher during periods of speech than periods of noise; the degree of variation (as 
illustrated by the standard deviation) is also higher, and less intermittently variable. 
5 It is noted that the standard deviation of the standard deviation of M is also a reliable measure; the 

effect of taking each standard deviation is essentially to smooth the measure. 

In this second form of Voice Activity Detector, the measured parameter used to decide whether speech 
is present is preferably the standard deviation of the Itakura-Saito Distortion Measure, but other measures of 
variance and other spectral distortion measures (based for example on FFT analysis) could be employed. 
w It is found advantageous to employ an adaptive threshold in voice activity detection. Such thresholds 
must not be adjusted during speech periods or the speech signal will be thresholded out. It is accordingly 
necessary to control the threshold adapter using a speech/non-speech control signal, and it is preferable 
that this control signal should be independent of the output of the threshold adapter. 

The threshold T is adaptively adjusted so as to keep the threshold level just above the level of the measure 
75 M when noise only is present. Since the measure will in general vary randomly when noise is present, the 
threshold is varied by determining an average level over a number of blocks, and setting the threshold at a 
level proportional to this average. In a noisy environment this is not usually sufficient, however, and so an 
assessment of the degree of variation of the parameter over several blocks is also taken into account. 
The threshold value T is therefore preferably calculated according to 

20 

T = M' + K.d 

where M' is the average value of the measure over a number of consecutive frames, d is the standard 
deviation of the measure over those frames, and K is a constant (which may typically be 2). 
25 In practice, it is preferred not to resume adaptation immediately after speech is indicated to be absent, 
but to wait to ensure the fall is stable (to avoid rapid repeated switching between the adapting and non- 
adapting states). 

Referring to Figure 3, in a preferred embodiment of the invention incorporating the above aspects, an 
input 1 receives a signal which is sampled and digitised by analogue to digital converter (ADC) 2, and 

30 supplied to the input of an inverse filter analyser 3, which in practice is part of a speech coder with which 
the voice activity detector is to work, and which generates coefficients Lj (typically 8) of a filter correspond- 
ing to the inverse of the input signal spectrum. The digitised signal is also supplied to an autocorrelator 4, 
(which is part of analyser 3) which generates the autocorrelation vector Rj of the input signal (or at least as 
many low order terms as there are LPC coefficients). Operation of these parts of the apparatus is as 

35 described in Figures 1 and 2. Preferably, the autocorrelation coefficients R s are then averaged over several 
successive speech frames (typically 5-20 ms long) to improve their reliability. This may be achieved by 
storing each set of autocorrelations coefficients output by autocorrelator 4 in a buffer 4a, and employing an 
averager 4b to produce a weighted sum of the current autocorrelation coefficients Rj and those from 
previous frames stored in and supplied from buffer 4a. The averaged autocorrelation coefficients Raj thus 

40 derived are supplied to weighting and adding means 5,6 which receives also the autocorrelation vector A| of 
stored noise-period inverse filter coefficients L, from an autocorrelator 14 via buffer 15, and forms from Raj 
and Aj a measure M preferably defined as: 

45 M = A Q +2^ Ra i A i , 

so This measure is then thresholded by thresholder 7 against a threshold level, and the logical result 
provides an indication of the presence or absence of speech at output 8. 

In order that the inverse filter coefficients L, correspond to a fair estimate of the inverse of the noise 
spectrum, it is desirable to update these coefficients during periods of noise (and, of course, not to update 
during periods of speech). It is, however, preferable that the speech/non-speech decision on which the 

55 updating is based does not depend upon the result of the updating, or else a single wrongly identified frame 
of signal may result in the voice activity detector subsequently going "out of lock" and wrongly identifying 
following frames. Preferably, therefore, there is provided a control signal generating circuit 20, effectively a 
separate voice activity detector, which forms an independent control signal indicating the presence or 
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absence of speech to control inverse filter analyser 3 (or buffer 8) so that the inverse filter autocorrelation 
coefficients A, used to form the measure M are only updated during "noise" periods. The control signal 
generator circuit 20 includes LPC analyser 21 (which again may be part of a speech coder and, specifically, 
may be performed by analyser 3), which produces a set of LPC coefficients M s corresponding to the input 

5 signal and an autocorrelator 21a (which may be performed by autocorrelator 3a) which derives the 
autocorrelation coefficients Bj of Mj. If analyser 21 is performed by analyser 3, then Mj = L| and Bj = Aj. 
These autocorrelation coefficients are then supplied to weighting and adding means 22,23 (equivalent to 5, 
6) which receive also the autocorrelation vector R { of the input signal from autocorrelator 4. A measure of 
the spectral similarity between the input speech frame and the preceding speech frame is thus calculated; 

70 this may be the Itakura-Saito distortion measure between R| of the present frame and B| of the preceding 
frame, as disclosed above, or it may instead be derived by calculating the Itakura - Saito distortion measure 
for R } and B { of the present frame, and subtracting (in subtractor 25) the corresponding measure for the 
previous frame stored in buffer 24, to generate a spectral difference signal (in either case, the measure is 
preferably energy-normalised by dividing by Ro). The buffer 24 is then, of course, updated. This spectral 

75 difference signal, when thresholded by a thresholder 26 is, as discussed above, an indicator of the presence 
or absence of speech. We have found, however, that although this measure is excellent for distinguishing 
noise from unvoiced speech (a task which prior art systems are generally incapable of) it is in general rather 
less able to distinguish noise from voiced speech. Accordingly, there is preferably further provided within 
circuit 20 a voiced speech detection circuit comprising a pitch analyser 27 (which in practice may operate 

20 as part of a speech coder, and in particular may measure the long term predictor lag value produced in a 
multipulse LPC coder). The pitch analyser 27 produces a logic signal which is "true" when voiced speech is 
detected, and this signal, together with the thresholded measure derived from thresholder 26 (which will 
generally be "true" when unvoiced speech is present) are supplied to the inputs of a NOR gate 28 to 
generate a signal which is "false" when speech is present and "true" when noise is present. This signal is 

25 supplied to buffer 8 (or to inverse filter analyser 3) so that inverse filter coefficients Li are only updated 
during noise periods. 

Threshold adapter 29 is also connected to receive the non-speech signal control output of control signal 
generator circuit 20. The output of the threshold adapter 29 is supplied to thresholder 7. The threshold 
adapter operates to increment or decrement the threshold in steps which are a proportion of the instant 

30 threshold value, until the threshold approximates the noise power level (which may conveniently be derived 
from, for example, weighting and adding circuits 22, 23). When the input signal is very low, it may be 
desirable that the threshold is automatically set to a fixed, low, level since at the low signal levels the effect 
of signal quantisation produced by ADC 2 can produce unreliable results. 

There may be further provided "hangover" generating means 30, which operates to measure the 

35 duration of indications of speech after thresholder 7 and, when the presence of speech has been indicated 
for a period in excess of a predetermined time constant, the output is held high for a short "hangover" 
period. In this way, clipping of the middle of low-level speech bursts is avoided, and appropriate selection of 
the time constant prevents triggering of the hangover generator 30 by short spikes of noise which are 
falsely indicated as speech. It will of course be appreciated that ail the above functions may be executed by 

40 a single suitably programmed digital processing means such as a Digital Signal Processing (DSP) chip, as 
part of an LPC codec thus implemented (this is the preferred implementation), or as a suitably programmed 
microcomputer or microcontroller chip with an associated memory device. 

Conveniently, as described above, the voice detection apparatus may be implemented as part of an 
LPC codec. Alternatively, where autocorrelation coefficients of the signal or related measures (partial 

45 correlation, or "parcor", coefficients) are transmitted to a distant station the voice detection may take place 
distantly from the codec. 

Claims 

so 1. Voice activity detection apparatus comprising: 

(i) means (1) for receiving a first, input, signal; 

(ii) means (14,15) for periodically adaptively generating a second signal representing an estimated 
noise signal component of the first signal; 

(iii) means (4,5,6) for periodically forming from the first and second signals a measure M of the 
55 spectral similarity between a portion of the input signal and the said estimated noise signal 

component; and 

(iv) means (7) for comparing the measure M with a threshold value T to produce an output indicating 
the presence or absence of speech; 
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characterised in that 

(v) the apparatus includes analysis means (13,3) operable to produce the coefficients of a filter 
having a spectral response which is the inverse of the frequency spectrum of one of the said two 
signals; and 

5 (vi) the measure forming means (4,5,6) are operable to produce a measure M which is proportional 

to the zero-order autocorrelation (R' 0 ) of a signal obtained by filtering of the other of the said two 
signals by a filter having the said coefficients. 

2. Apparatus according to claim 1 in which the analysis means (13,3) includes an adaptive filter. 

w 

3. Apparatus according to claim 1, in which the generating means (14,15) are operable to compute the 
autocorrelation coefficients A| of the impulse response of the said coefficients and the measure forming 
means (4) comprises means for computing the autocorrelation coefficients Rj of the said other signal, 
and means (5,6) connected to receive R s and Aj, and to calculate the measure M therefrom. 

75 

4. Apparatus according to claim 2 in which the means (4) for computing the autocorrelation coefficients R| 
of the said other signal are arranged (4a,4b) to do so in dependence upon the autocorrelation 
coefficients of several successive portions of the signal. 

20 5. Apparatus according to claim 3 or 4, in which 

M - fl 0 A> + 2Z RjAj. 

where Ai represents the ith autocorrelation coefficient of the impulse response of said filter. 

25 

6. Apparatus according to claim 3 or 4, in which 



30 M = A Q + 2^2 



r 



where Aj represents the i th autocorrelation coefficient of the impulse response of said filter. 

35 7. Apparatus according to any one of claims 1 to 6, in which the said one signal is the second, noise 
representing, signal and the said other signal is the first, input signal. 

8. Apparatus according to claim 7, further comprising an input (11) arranged to receive a second input 
signal, similarly subject to noise, from which speech is absent, in which the generating means comprise 

40 LPC analysis means (13) for deriving values of A, from the second input signal. 

9. Apparatus according to any one of claims 1 to 7, further comprising a buffer (15) connected to store 
data from which the autocorrelation coefficients A t of the said filter response may be obtained, in which 
the said filter response is periodically calculated from the signal by LPC analysis means (3), the 

45 apparatus being so connected and controlled that the measure M is calculated using the said stored 

data, and the said stored data is updated only from periods in which speech is indicated to be absent. 

10. Apparatus according to claim 9 further comprising means (20) for indicating the absence of speech to 
control the updating of the stored data, the means (20) for indicating the absence of speech being a 

50 second voice activity detection means (20). 

11. Apparatus according to any one of the preceding claims, further comprising means (29) for adjusting 
the said threshold value T during periods when speech is indicated to be absent. 

55 12. Apparatus according to claim 11, further comprising second voice activity detection means (20) 
arranged to prevent adjustment of the threshold value when speech is present. 
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13. Apparatus according to claim 10 further comprising means (20) for adjusting the said threshold value T 
during periods when speech is indicated to be absent, the said second voice activity detection means 
(20) being arranged to prevent adjustment of the threshold value when speech is present. 

5 14. Apparatus according to claim 11, 12 or 13 in which the threshold value T is, when adjusted, adjusted to 
be equal to the mean of the measure plus a term which is a fraction of the standard deviation of the 
measure. 

15. Apparatus according to claim 10, 13 or 14 in which said second voice activity detection means (20) 
w comprises means (4, 21, 21a, 22, 23, 24, 25, 26) for generating a measure of the spectral similarity 

between a portion of the input signal and earlier portions of the input signal. 

16. Apparatus according to claim 15 in which the similarity measure generating means comprises means 
(4, 21 , 22, 23) for providing, from LPC filter data and autocorrelation data relating to a present portion of 

75 the input signal, a present distortion measure; means (24) for providing an equivalent past frame 
distortion measure corresponding to a preceding portion of the input signal, and means (25, 26) for 
generating a signal indicating the degree of similarity therebetween as an indicator of speech presence 
or absence. 

20 17. Apparatus according to claim 15 or 16, in which said second voice activity detection means (20) further 
comprises voiced speech detection means (27) comprising pitch analysis means (27), for generating a 
signal indicative of the presence of voiced speech, upon which the output of said second voice activity 
detection means (20) also depends. 

25 18. A method of detecting voice activity in a first, input, signal, comprising 

(a) periodically adaptively generating a second signal representing an estimated noise signal 
component of the first signal; 

(b) periodically forming from the first and second signals a measure M of the spectral similarity 
between a portion of the input signal and the said estimated noise signal component; and 

30 (c) comparing the measure M with a threshold value T to produce an output indicating the presence 

or absence of speech; 
characterised by 

(d) the step of producing the coefficients of a filter having a spectral response which is the inverse of 
the frequency spectrum of one of the said two signals; and in that 
35 (e) the measure M is proportional to the zero-order autocorrelation R'o of a signal obtained by 

filtering of the other of the said two signals by a filter having the said coefficients. 

19. Apparatus for encoding speech signals including apparatus according to any one of claims 1 to 17. 

40 20. Mobile telephone apparatus including apparatus according to any one claims 1 to 17. 

Patentanspruche 

1. Vorrichtung zum Erfassen der Anwesenheit von Sprache, die aufweist: 
45 (i) Eine Einrichtung (1) zum Empfangen eines ersten Eingangssignales; 

(ii) eine Einrichtung (14, 15) zum periodischen adaptiven Erzeugen eines zweiten Signales, das eine 
geschatzte Rauschsignalkomponente des ersten Signales darstellt; 

(iii) eine Einrichtung (4, 5, 6) zum periodischen Bilden aus dem ersten und zweiten Signal eines 
MaBes M der spektralen Ahnlichkeit zwischen einem Abschnitt des Eingangssignales und der 

so geschatzten Rauschsignalkomponente; und 

(iv) eine Einrichtung (7) zum Vergleichen des MaBes M mit einem Schwellwert T, urn eine Ausgabe 
zu erzeugen, die die Anwesenheit oder Abwesenheit von Sprache anzeigt; 

dadurch gekennzeichnet, daB 

(v) die Vorrichtung eine Analyseeinrichtung (13, 3) aufweist, die betreibbar ist, um die Koeffizienten 
55 eines Filters, das eine Spektralantwort hat, die die Inverse des Frequenzspektrums eines der beiden 

Signale ist, zu erzeugen; und 

(vi) die maBbildende Einrichtung (4, 5, 6), die betreibbar ist, um ein MaB M zu erzeugen, das 
proportional zu der Autokorrelation R' 0 nullter Ordnung eines Signales ist, das durch Filtern des 
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anderen der beiden Signale durch ein Filter erhalten wird, das die Koeffizienten hat. 

2. Vorrichtung gemaB Anspruch 1, in der die Analyseeinrichtung (13, 3) ein adaptives Filter aufweist. 

5 3. Vorrichtung gemaB Anspruch 1, in der die erzeugende Einrichtung (14, 15) betreibbar ist, um die 
Autokorrelationskoeffizienten A| der Impulsantwort der Koeffizienten zu berechnen, und in der die 
maBbildende Einheit (4) eine Einrichtung zum Berechnen der Autokorrelationskoeffizienten Rj des 
anderen Signales aufweist, und eine Einrichtung (5, 6), die verbunden ist, um Rj und A, zu empfangen 
und das MaB daraus zu berechnen. 

10 

4. Vorrichtung gemaB Anspruch 2, bei der die Einrichtung (4) zum Berechnen der Autokorrelationskoeffi- 
zienten Rj des anderen Signales angeordnet ist (4a, 4b), um dies in Abhangigkeit von den Autokorrela- 
tionskoeffizienten mehrerer aufeinanderfolgender Abschnitte des Signales zu machen. 

75 5. Vorrichtung gemaB Anspruch 3 Oder 4, bei der gilt: 

M ~ R 0 Ao + 2L RjAj 

wobei Aj den i-ten Autokorrelationskoeffizienten der Impulsantwort des Filters darstellt. 

20 

6. Vorrichtung gemaB Anspruch 3 Oder 4, bei der gilt: 



m = a 0 + 2£ 




wobei Ai den i-ten Autokorrelationskoeffizienten der Impulsantwort des Filters darstellt. 

30 

7. Vorrichtung gemaB einem der AnsprUche 1 bis 6, bei der das eine Signal das zweite Rauschen 
darstellende Signal ist und das andere Signal das erste Eingangssignal ist. 

8. Vorrichtung gemaB Anspruch 7, die weiterhin einen Eingang (11) aufweist, der angeordnet ist, um ein 
35 zweites Eingangssignal zu empfangen, das ahnlich Rauschen unterworfen ist, von dem Sprache 

abwesend ist, in dem die erzeugende Einrichtung eine LPC-Analyseeinrichtung (13) aufweist, zum 
Ableiten der Werte von Aj aus dem zweiten Eingangssignal. 

9. Vorrichtung gemaB einem der Anspruche 1 bis 7, die weiterhin einen Puffer (15) aufweist, der 
40 verbunden ist, um Daten zu speichern, aus denen die Autokorrelationskoeffizienten A, der Filterantwort 

erhalten werden konnen, in der die Filterantwort periodisch von dem Signal durch eine LPC-Analyseein- 
richtung (3) berechnet wird, wobei die Vorrichtung so verbunden und gesteuert ist, dafi das MaB M 
berechnet wird unter Verwendung der gespeicherten Daten, und wobei die gespeicherten Daten nur 
von Perioden aktualisiert werden, in denen Sprache als anwesend angezeigt ist. 

45 

10. Vorrichtung gemaB Anspruch 9, die weiterhin eine Einrichtung (20) zum Anzeigen der Abwesenheit von 
Sprache aufweist, um das Aktualisieren der gespeicherten Daten zu steuern, wobei die Einrichtung (20) 
zum Anzeigen der Abwesenheit von Sprache eine zweite Sprachaktivitatserfassungseinrichtung (20) ist. 

so 11. Vorrichtung gemaB einem der vorhergehenden Anspruche, die weiterhin eine Einrichtung (29) zum 
Einstellen des Schwellwertes T wahrend Perioden, wenn Sprache als abwesend angezeigt ist, aufweist. 

12. Vorrichtung gemaB Anspruch 11, die weiterhin eine zweite Erfassungseinrichtung (20) fUr die Anwesen- 
heit von Sprache aufweist, die angeordnet ist, um die Einstellung des Schwellwertes zu verhindern, 
55 wenn Sprache vorliegt. 

1a Vorrichtung gemaB Anspruch 10, die weiterhin eine Einrichtung (20) zum Einstellen des Schwellwertes 
T wahrend Perioden aufweist, bei denen Sprache als anwesend angezeigt wird, wobei die zweite 
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Erfassungseinrichtung (20) fur die Anwesenheit von Sprache angeordnet ist, um eine Einstellung des 
Schwelienwertes zu verhindern, wenn Sprache vorliegt. 

14. Vorrichtung gemaB den Anspruchen 11, 12 oder 13, bei der der Schwellwert T. wenn eingestellt, 
eingestelrt ist, um gleich dem Mittel des MaBes plus einem Term zu sein, der ein Bruchteil der 
Standardabweichung des MaBes ist. 

15. Vorrichtung gemaB Anspruch 10, 13 oder 14, bei dem die zweite Sprachaktivitatserfassungseinrichtung 
(20) eine Einrichtung (4, 21, 21a, 22, 23, 24, 25. 26) zum Erzeugen eines MaBes der spektralen 
Ahnlichkeit zwischen einem Abschnitt des Eingabesignales und fruherer Abschnitte des Eingabesigna- 
les aufweist 

16. Vorrichtung gemaB Anspruch 15, bei der die das AhnlichkeitsmaB erzeugende Einrichtung Einrichtun- 
gen (4, 21, 22, 23) aufweist zum Bereitstellen aus LPC-Filterdaten und Autokorrelationsdaten, die sich 
auf einen vorliegenden Abschnitt des Eingangssignales beziehen, eines vorliegenden Verzerrungsma- 
Bes, eine Einrichtung (24) zum Bereitstellen eines aquivalenten VerzerrungsmaBes des vergangenen 
Rahmens, entsprechend einem vorhergehenden Abschnitt des Eingangssignales, und Einrichtungen 
(25, 26) zum Erzeugen eines Signales, das den Grad der Ahnlichkeit zwischen ihnen als ein Indikator 
von Sprachanwesenheit oder -abwesenheit anzeigt. 

17. Vorrichtung gemaB Anspruch 15 oder 16, bei der die zweite Erfassungseinrichtung (20) fur die 
Anwesenheit von Sprache weiterhin eine Erfassungseinrichtung fur stimmhafte Sprache (27) aufweist, 
die eine Tonhoheanalyseeinrichtung (27) aufweist zum Erzeugen eines Signales, das die Anwesenheit 
von stimmhafter Sprache anzeigt, von dessen Ausgabe die zweite Erfassungseinrichtung (20) fur die 
Anwesenheit von Sprache ebenfalls abhangt. 

18. Verfahren zum Erfassen der Anwesenheit von Sprache in einem ersten Eingangssignal, das aufweist: 

(a) Periodisches adaptives Erzeugen eines zweiten Signales, das eine geschatzte Rauschsignalkom- 
ponente des ersten Signales darstellt; 

(b) periodisches Bilden aus dem ersten und zweiten Signal eines MaBes M der spektralen 
Ahnlichkeit zwischen einem Abschnitt des Eingangssignales und der geschatzten Rauschsignalkom- 
ponente; und 

(c) Vergleichen des MaBes M mit einem Schwellwert T, um eine Ausgabe zu produzieren, die die 
Anwesenheit oder Abwesenheit von Sprache anzeigt; 

dadurch gekennzeichnet, daB 

(d) der Schritt des Produzierens der Koeffizienten eines Filters, das eine Spektralantwort hat, die die 
Inverse des Frequenzspektrums eines der beiden Signale ist; und darin, daB 

(e) das MaB M proportional zu der Autokorrelation R'o nullter Ordnung eines Signales ist, das durch 
Filtern des anderen der beiden Signale durch ein Filter erhalten wird, der die Koeffizienten hat. 

19. Vorrichtung zum Codieren von Sprachsignalen, die eine Vorrichtung gemaB einem der Anspruche 1 bis 
17 aufweist. 

20. Mobiltelefonvorrichtung, die eine Vorrichtung gemaB einem der Anspruche 1 bis 17 aufweist. 
Revendications 

1. Appareil de detection d'activite vocale comprenant: 

(i) un moyen de reception (1) d'un premier signal d'entree; 

(ii) un moyen de generation peYiodique adaptative (14, 15) d'un deuxieme signal representant une 
composante estimee de signal de bruit du premier signal; 

(iii) un moyen de formation pe>iodique (4, 5, 6), a partir du premier et du deuxieme signaux, d'une 
mesure M de la similitude spectrale entre une partie du signal d'entree et ladite composante 
estimee de signal de bruit; et 

(iv) un moyen de comparaison (7) de la mesure M avec une valeur de seuil T afin de produire une 
sortie indiquant la presence ou ('absence de parole; 

caracteVise en ce que: 
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(v) I'appareil inclut un moyen cTanalyse (13, 3) qui peut etre mis en oeuvre pour produire les 
coefficients d'un filtre dont la rgponse spectrale est Pinverse du spectre de frequence d'un premier 
desdits deux signaux; 

(vi) les moyens de formation (4, 5, 6) de mesure peuvent etre mis en oeuvre pour produire une 
5 mesure M qui est proportionnelle a ('autocorrelation d'ordre zero (R'o) d'un signal obtenu en filtrant, 

au moyen d'un filtre possedant lesdits coefficients, I'autre desdits deux signaux. 

2. Appareil selon la revendication 1 dans lequel le moyen d'analyse (13, 3) inclut un filtre adaptatif. 

70 3. Appareil selon la revendication 1, dans lequel les moyens generateurs (14, 15) peuvent etre mis en 
oeuvre pour calculer les coefficients Aj d'autocorrelation Aj de la reponse d'impulsion desdits coeffi- 
cients et le moyen de formation (4) de mesure comprend un moyen de calcul des coefficients 
d'autocorrelation (Rj) dudit autre signal, et les moyens (5, 6) sont relics de maniere a recevoir R s etA, et 
a calculer a partir de ceux-ci la mesure M. 

75 

4. Appareil selon la revendication 2 dans lequel le moyen de calcul (4) des coefficients d'autocorrelation 
Rj dudit autre signal sont agences (4a, 4b) de maniere a les calculer en fonction des coefficients 
d'autocorrelation de plusieurs autres parties du signal. 

20 5. Appareil selon la revendication 3 ou 4 dans lequel 

M = RoAo + 2 E RA 

ou Ai represente le i-ieme coefficient d'autocorrelation de la reponse d'impulsion dudit filtre. 

25 

6. Appareil selon la revendication 3 ou 4, dans lequel 



M = RqAq + 2 E R^Ai 

ou Ai represente le i-ieme coefficient d'autocorrelation de la re*ponse d'impulsion dudit filtre. 

35 7. Appareil selon Tune quelconque des revendications 1 a 6 dans lequel ledit premier signal est le 
deuxieme signal, representant le bruit, et ledit autre signal est le premier signal, d'entree. 

8. Appareil selon la revendication 7, comprenant en outre une entree (11) agencee de maniere a recevoir 
un deuxieme signal d'entree, sujet lui aussi a un bruit, dont une parole est absente, dans lequel le 

40 moyen generateur comprend un moyen d'analyse (13) a codage a prediction lineaire, ou LPC, pour 

deriver des valeurs de Aj a partir du deuxieme signal d'entree. 

9. Appareil selon I'une quelconque des revendications 1 a 7, comprenant en outre un tampon (15) relie de 
maniere a m£moriser des donnees a partir desquelles peuvent etre obtenus les coefficients d'autocor- 

45 relation Aj de ladite reponse de filtre, dans lequel ladite reponse de filtre est calculee periodiquement a • 

partir du signal par un moyen d'analyse (3) de signal a codage a prediction lineaire, I'appareil 6tant 
relie et commande d'une maniere telle que la mesure M est calculee en utilisant ladite donnee 
m£morisee, et la donn6e m£morisee n'est mise a jour qu'a partir de pe>iodes dans lesquelles la parole 
est indiquee comme absente. 

50 

10. Appareil selon la revendication 9, comprenant en outre un moyen (20) d'indication de I'absence de 
parole pour commander la mise a jour de la donnee m^moris^e, le moyen (20) d'indication de 
I'absence de parole etant un deuxieme moyen de detection (20) d'activite vocale. 

55 11. Appareil selon I'une quelconque des revendications precedentes comprenant en outre un deuxieme 
moyen (29 d'ajustement de ladite valeur de seuil T pendant des periodes dans lesquelles la parole est 
indiquee comme absente. 
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12. Appareit selon la revendication 11, comprenant en outre un deuxieme moyen de detection (20) 
d'activite vocale. agence* de maniere a empecher un ajustement de la valeur de seuil lorsqu'une parole 
est presente. 

s 13. Appareil selon la revendication 10 comprenant en outre un moyen (20) d'ajustement (20) de ladite 
valeur de seuil T pendant des periodes ou il est indique qu'une parole est absente, ledit deuxieme 
moyen (20) de detection d'activite* vocale etant agence de fag on a empecher un ajustement de la 
valeur de seuil lorsqu'une parole est presente. 

w 14. Appareil selon la revendication 11, 12 ou 13, dans lequel la valeur de seuil T est ajustee, lorsque elle 
Test, de maniere a etre dgale a la moyenne de la mesure, augmented d'un terme qui est une fraction 
de I'ecart type de la mesure. 

15. Appareil selon la revendication 10, 13 ou 14 dans lequel ledit deuxieme moyen (20) de detection 
75 d'activite vocale comprend un moyen (4, 21, 21a, 22, 23, 24, 25, 26) de generation d'une mesure de 

similitude spectrale entre une partie du signal d'entree et des parties anterieures du signal d'entree. 

16. Appareil selon la revendication 15 dans lequel le moyen generateur de mesure de similitude comprend 
des moyens (4, 21 , 22, 23) de production d'une mesure actuelle de distorsion, a partir de donnees de 

20 filtre de codage a prediction lin£aire et de donnees d'autocorreiation concernant une partie actuelle du 
signal d'entree; un moyen (24) de production d'une mesure equivalente de distorsion de structure 
passed, correspondent a une partie precedence du signal d'entree; et des moyens (25, 26) de 
generation d'un signal indiquant le degre de similitude entre ces mesures tant qu'indicateur de la 
presence ou de I'absence d'une parole. 

25 

17. Appareil selon la revendication 15 ou 16, dans lequel ledit deuxieme moyen de detection (20) d'activite 
vocale comprend en outre un moyen de detection (27) de parole voisee comprenant un moyen 
d'analyse de hauteur sonore afin d'engendrer un signal indicatif de la presence d'une parole voisee, 
dont depend aussi la sortie du deuxieme moyen de detection (20) d'activite vocale. 

30 

18. Un procede de detection d'activite vocale dans un premier signal, d'entree, comprenant les etapes 
consistant a: 

(a) engendrer periodiquement de fagon adaptative un deuxieme signal representant une composante 
estimee d'un signal de bruit du premier signal; 
35 (b) former periodiquement, a partir du premier et du deuxieme signaux, une mesure M de la 

similitude spectrale entre une partie du signal d'entree et ladite composante estimee de signal de 
bruit; et 

(c) comparer la mesure M a une valeur de seuil T afin de produire une sortie indiquant la presence 
ou I'absence d'une parole; 

40 caracterise par 

(d) I'etape de production des coefficients d'un filtre dont la reponse spectrale est I'inverse du 
spectre de frequence d'un premier desdits deux signaux; et par le fait que 

la mesure M est proportionnelle a I 'autocorrelation R'o d'ordre zero d'un signal obtenu en filtrant 
I'autre desdits deux signaux par un filtre poss^dant lesdits coefficients. 

45 

19. Appareil d'encodage de signaux de parole incluant I'appareil selon Tune quelconque des revendications 
1 a 17. 

20. Appareil telephonique mobile incluant un appareil selon Tune quelconque des revendications 1 a 17. 
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